dro model
- North America > United States > California > Santa Clara County > Palo Alto (0.04)
- Asia > China > Hong Kong (0.04)
- North America > United States > California > Santa Clara County > Palo Alto (0.04)
- Asia > China > Hong Kong (0.04)
- North America > United States > California > Santa Clara County > Palo Alto (0.04)
- Asia > China > Hong Kong (0.04)
The Distributionally Robust Optimization Model of Sparse Principal Component Analysis
Wang, Lei, Liu, Xin, Chen, Xiaojun
We consider sparse principal component analysis (PCA) under a stochastic setting where the underlying probability distribution of the random parameter is uncertain. This problem is formulated as a distributionally robust optimization (DRO) model based on a constructive approach to capturing uncertainty in the covariance matrix, which constitutes a nonsmooth constrained min-max optimization problem. We further prove that the inner maximization problem admits a closed-form solution, reformulating the original DRO model into an equivalent minimization problem on the Stiefel manifold. This transformation leads to a Riemannian optimization problem with intricate nonsmooth terms, a challenging formulation beyond the reach of existing algorithms. To address this issue, we devise an efficient smoothing manifold proximal gradient algorithm. We prove the Riemannian gradient consistency and global convergence of our algorithm to a stationary point of the nonsmooth minimization problem. Moreover, we establish the iteration complexity of our algorithm. Finally, numerical experiments are conducted to validate the effectiveness and scalability of our algorithm, as well as to highlight the necessity and rationality of adopting the DRO model for sparse PCA.
- Asia > China > Hong Kong (0.04)
- North America > United States > Michigan (0.04)
- North America > Canada > Ontario > Toronto (0.04)
- (2 more...)
Learning Against Distributional Uncertainty: On the Trade-off Between Robustness and Specificity
Wang, Shixiong, Wang, Haowei, Honorio, Jean
Trustworthy machine learning aims at combating distributional uncertainties in training data distributions compared to population distributions. Typical treatment frameworks include the Bayesian approach, (min-max) distributionally robust optimization (DRO), and regularization. However, two issues have to be raised: 1) All these methods are biased estimators of the true optimal cost; 2) the prior distribution in the Bayesian method, the radius of the distributional ball in the DRO method, and the regularizer in the regularization method are difficult to specify. This paper studies a new framework that unifies the three approaches and that addresses the two challenges mentioned above. The asymptotic properties (e.g., consistency and asymptotic normalities), non-asymptotic properties (e.g., unbiasedness and generalization error bound), and a Monte--Carlo-based solution method of the proposed model are studied. The new model reveals the trade-off between the robustness to the unseen data and the specificity to the training data.
- Asia > Singapore > Central Region > Singapore (0.04)
- North America > United States > New York (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.93)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.87)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)
Tikhonov Regularization is Optimal Transport Robust under Martingale Constraints
Li, Jiajin, Lin, Sirui, Blanchet, Jose, Nguyen, Viet Anh
Distributionally robust optimization has been shown to offer a principled way to regularize learning models. In this paper, we find that Tikhonov regularization is distributionally robust in an optimal transport sense (i.e., if an adversary chooses distributions in a suitable optimal transport neighborhood of the empirical measure), provided that suitable martingale constraints are also imposed. Further, we introduce a relaxation of the martingale constraints which not only provides a unified viewpoint to a class of existing robust methods but also leads to new regularization tools. To realize these novel tools, tractable computational algorithms are proposed. As a byproduct, the strong duality theorem proved in this paper can be potentially applied to other problems of independent interest.
- North America > United States > California > Santa Clara County > Palo Alto (0.04)
- Asia > China > Hong Kong (0.04)
Distributionally Robust Classifiers in Sentiment Analysis
Li, Shilun, Li, Renee, Zhang, Carina
In this paper, we propose sentiment classification models based on BERT integrated with DRO (Distributionally Robust Classifiers) to improve model performance on datasets with distributional shifts. We added 2-Layer Bi-LSTM, projection layer (onto simplex or Lp ball), and linear layer on top of BERT to achieve distributionally robustness. We considered one form of distributional shift (from IMDb dataset to Rotten Tomatoes dataset). We have confirmed through experiments that our DRO model does improve performance on our test set with distributional shift from the training set.
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Information Extraction (0.86)
- Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (0.86)
Worst-case sensitivity
Gotoh, Jun-ya, Kim, Michael Jong, Lim, Andrew E. B.
We introduce the notion of Worst-Case Sensitivity, defined as the worst-case rate of increase in the expected cost of a Distributionally Robust Optimization (DRO) model when the size of the uncertainty set vanishes. We show that worst-case sensitivity is a Generalized Measure of Deviation and that a large class of DRO models are essentially mean-(worst-case) sensitivity problems when uncertainty sets are small, unifying recent results on the relationship between DRO and regularized empirical optimization with worst-case sensitivity playing the role of the regularizer. More generally, DRO solutions can be sensitive to the family and size of the uncertainty set, and reflect the properties of its worst-case sensitivity. We derive closed-form expressions of worst-case sensitivity for well known uncertainty sets including smooth $\phi$-divergence, total variation, "budgeted" uncertainty sets, uncertainty sets corresponding to a convex combination of expected value and CVaR, and the Wasserstein metric. These can be used to select the uncertainty set and its size for a given application.
- Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)
- North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
- Asia > Singapore > Central Region > Singapore (0.04)
Distributionally Robust Neural Networks for Group Shifts: On the Importance of Regularization for Worst-Case Generalization
Sagawa, Shiori, Koh, Pang Wei, Hashimoto, Tatsunori B., Liang, Percy
Overparameterized neural networks can be highly accurate on average on an i.i.d. test set yet consistently fail on atypical groups of the data (e.g., by learning spurious correlations that hold on average but not in such groups). Distributionally robust optimization (DRO) allows us to learn models that instead minimize the worst-case training loss over a set of pre-defined groups. However, we find that naively applying group DRO to overparameterized neural networks fails: these models can perfectly fit the training data, and any model with vanishing average training loss also already has vanishing worst-case training loss. Instead, their poor worst-case performance arises from poor generalization on some groups. By coupling group DRO models with increased regularization---stronger-than-typical $\ell_2$ regularization or early stopping---we achieve substantially higher worst-group accuracies, with 10-40 percentage point improvements on a natural language inference task and two image tasks, while maintaining high average accuracies. Our results suggest that regularization is critical for worst-group generalization in the overparameterized regime, even if it is not needed for average generalization. Finally, we introduce and give convergence guarantees for a stochastic optimizer for the group DRO setting, underpinning the empirical study above.
- North America > United States > California > Santa Clara County > Palo Alto (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Distributionally Robust Optimization: A Review
Rahimian, Hamed, Mehrotra, Sanjay
The concepts of risk-aversion, chance-constrained optimization, and robust optimization have developed significantly over the last decade. Statistical learning community has also witnessed a rapid theoretical and applied growth by relying on these concepts. A modeling framework, called distributionally robust optimization (DRO), has recently received significant attention in both the operations research and statistical learning communities. This paper surveys main concepts and contributions to DRO, and its relationships with robust optimization, risk-aversion, chance-constrained optimization, and function regularization.
- Europe (0.27)
- North America > United States > New Jersey (0.14)
- North America > United States > Massachusetts (0.13)
- (3 more...)
- Overview (1.00)
- Research Report > Experimental Study (0.45)
- Instructional Material > Course Syllabus & Notes (0.45)